[WIP] 🌱 e2e: change behavior of VerifyMachinesReady to verify once after a successful list of machines#13015
Conversation
…ccessful list of machines
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
| } | ||
| }, 5*time.Minute, 10*time.Second).Should(Succeed(), "Failed to list Machines to check the Ready condition for Cluster %s", klog.KRef(input.Namespace, input.Name)) | ||
|
|
||
| Expect(machineList.Items).ToNot(BeEmpty(), "No machines found for cluster %s", input.Name) |
There was a problem hiding this comment.
I'm not sure if this is desirable in that way considering the Prow CI infrastructure.
I would expect a certain amount of flakiness purely because of the infra we run on.
We can give this a try early next cycle though and then keep an eye on k8s-triage
Alternative to this would be to just reduce the timeout to e.g. 30s/1m. It would allow some sort of flakiness but not entire rollout sequences in progress.
Side note: In general we probably want to do the same for VerifyClusterCondition / VerifyClusterAvailable
What this PR does / why we need it:
The
VerifyMachinesReadyis used in places where we think a Cluster is a certain state.Instead of checking if the Machine's Ready conditions are true, the function currently waits up to five minutes for all Machine's Ready conditions to get true.
The logic currently is as follows:
cluster-api/test/framework/cluster_helpers.go
Lines 490 to 515 in 8dcfbaf
This changes the behavior to only retry the list api calls.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)format, will close the issue(s) when PR gets merged):Fixes #
/area e2e-testing